Representing Videos based on Scene Layouts for Recognizing Agent-in-Place Actions
نویسندگان
چکیده
We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance. We introduce a representation of the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training set to unseen layouts in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places in a specific scene layout. LIVR partitions the semantic features of a video clip into different places to force the network to learn place-based feature descriptions; to predict the confidence of each action, LIVR aggregates features from the place associated with an action and its adjacent places on the scene layout. We introduce the Agent-inPlace Action dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.
منابع مشابه
Compressed Domain Scene Change Detection Based on Transform Units Distribution in High Efficiency Video Coding Standard
Scene change detection plays an important role in a number of video applications, including video indexing, searching, browsing, semantic features extraction, and, in general, pre-processing and post-processing operations. Several scene change detection methods have been proposed in different coding standards. Most of them use fixed thresholds for the similarity metrics to determine if there wa...
متن کاملRepresentation and Visual Recognition of Complex , Multi - agent Actions using Belief
A probabilistic framework for representing and visually recognizing complex multi-agent action is presented. Motivated by work in model-based object recognition and designed for the recognition of action from visual evidence , the representation has three components: (1) temporal structure descriptions representing the logical and temporal relationships between agent goals, (2) belief networks ...
متن کاملRepresentation and Visual Recognition of Complex, Multi-agent Actions using Belief Networks
A probabilistic framework for representing and visually recognizing complex multi-agent action is presented. Motivated by work in model-based object recognition and designed for the recognition of action from visual evidence, the representation has three components: (1) temporal structure descriptions representing the logical and temporal relationships between agent goals, (2) belief networks f...
متن کاملGenerating Videos with Scene Dynamics
We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene’s foreground from the background. Experiments suggest this ...
متن کاملTrajectory aligned features for first person action recognition
Egocentric videos are characterised by their ability to have the first person view. With the popularity of Google Glass and GoPro, use of egocentric videos is on the rise. Recognizing action of the wearer from egocentric videos is an important problem. Unstructured movement of the camera due to natural head motion of the wearer causes sharp changes in the visual field of the egocentric camera c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018